Skip to content

For testing macb stall (mainline)#7472

Draft
nbuchwitz wants to merge 5 commits into
raspberrypi:rpi-6.18.yfrom
nbuchwitz:devel/macb-tx-stall-cleanup
Draft

For testing macb stall (mainline)#7472
nbuchwitz wants to merge 5 commits into
raspberrypi:rpi-6.18.yfrom
nbuchwitz:devel/macb-tx-stall-cleanup

Conversation

@nbuchwitz

Copy link
Copy Markdown
Contributor

No description provided.

nbuchwitz and others added 5 commits July 3, 2026 13:03
…rn_ratelimited"

This reverts commit b2f7eec.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
…_tx_poll"

This reverts commit 60fc80b.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
This reverts commit 79dc190.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
This reverts commit ff6914e.

Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
…write

The MACB found in the Raspberry Pi RP1 suffers from sporadic stalls on
the TX queue.
While the exact root cause is not yet fully understood, it is likely
related to a hardware issue where a TSTART write to the NCR register
is missed, preventing the transmission from being kicked off.

Implement a timeout callback to handle TX queue stalls, triggering the
existing restart mechanism to recover.

Link: https://lore.kernel.org/all/20260514215459.36109-1-lukasz@raczylo.com/
Fixes: dc110d1 ("net: cadence: macb: Add support for Raspberry Pi RP1 ethernet controller")
Signed-off-by: Lukasz Raczylo <lukasz@raczylo.com>
Co-developed-by: Steffen Jaeckel <sjaeckel@suse.de>
Signed-off-by: Steffen Jaeckel <sjaeckel@suse.de>
Co-developed-by: Andrea della Porta <andrea.porta@suse.com>
Signed-off-by: Andrea della Porta <andrea.porta@suse.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Reviewed-by: Théo Lebrun <theo.lebrun@bootlin.com>
Link: https://patch.msgid.link/468f480454a314303bac6a54780b153f689f2267.1781598350.git.andrea.porta@suse.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit e438ec3)
@pelwell

pelwell commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

I'm running with this now...

@pelwell

pelwell commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

2 hours later and I've not hit a timeout (there's a pr_err in my build) but that doesn't surprise me - we never saw the stall - and no more error messages because the special stall detection code has gone.

@satmandu

satmandu commented Jul 3, 2026

Copy link
Copy Markdown

Is there actually a stall happening that was previously masked?

@nbuchwitz

Copy link
Copy Markdown
Contributor Author

I have seen this stall once or twice in my lab, so I can confirm that they exist. Andrea pointed me to a potential reproducer, but I was never able to provoke the stall with it though.

https://lore.kernel.org/netdev/aiwDsd-1IzeBzcaD@apocalypse/

@pelwell

pelwell commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

I think it's more likely to be a false positive

@pelwell

pelwell commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

I don't think either of us are questioning whether stalls are a problem - the question is whether the messages that have started appearing are indications of a real fault having occurred, or an accidental triggering due to a fault in the detection logic.

@nbuchwitz

nbuchwitz commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Let's give this some broader testing. As you already know I'm always in favor of aligning with mainline (where possible).

@satmandu, any chance that you can test this too? You can download the kernel with rpi-update pulls/7472 (the usual precautions apply etc.)

@satmandu

satmandu commented Jul 3, 2026

Copy link
Copy Markdown

I'm not seeing the messages any more with this kernel.

https://pastebin.com/mLJ1dTUP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants